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Type-IV DCT, DST, and MDCT algorithms 
with reduced numbers of arithmetic operations 
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Abstract — We present algorithms for the type-IV discrete 
cosine transform (DCT-IV) and discrete sine transform (DST- 
IV), as well as for the modified discrete cosine transform (MDCT) 
and its inverse, that achieve a lower count of real multiplications 
and additions than previously published algorithms, without 
sacrificing numerical accuracy. Asymptotically, the operation 
count is reduced from 2Af logj N + 0{N) to ^Nlog^N + OiN) 
for a power-of-two transform size A'^, and the exact count is 
strictly lowered for all TV > 8. These results are derived by 
considering the DCT to be a special case of a DFT of length SN, 
with certain symmetries, and then pruning redundant operations 
from a recent improved fast Fourier transform algorithm (based 
on a recursive rescaling of the conjugate-pair split-radix algo- 
rithm). The improved algorithms for DST-IV and MDCT follow 
immediately from the improved count for the DCT-IV. 

Index Terms — discrete cosine transform; lapped transform; 
fast Fourier transform; arithmetic complexity 



I. Introduction 

In this paper, we present recursive algorithms for type-IV 
discrete cosine and sine transforms (DCT-IV and DST-IV) 
and modified discrete cosine transforms (MDCTs), of power- 
of-two sizes N, that require fewer total real additions and 
multiplications (herein called flops) than previously published 
algorithms (with an asymptotic reduction of about 6%), with- 
out sacrificing numerical accuracy. This work, extending our 
previous results for small fixed N [1], appears to be the 
first time in over 20 years that flop counts for the DCT- 
IV and MDCT have been reduced — although computation 
times are no longer generally determined by arithmetic counts 
[2], the question of the minimum number of flops remains 
of fundamental theoretical interest. Our fast algorithms are 
based on one of two recently published fast Fourier trans- 
form (FFT) algorithms [1], [3], which reduced the operation 
count for the discrete Fourier transform (DFT) of size N to 
^Nlog2N + 0{N) compared to the (previous best) split- 
radix algorithm's 4iVlog2 N + 0{N) [4]-[8]. Given the new 
FFT, we treat a DCT as an FFT of real-symmetric inputs and 
eliminate redundant operations to derive the new algorithm; in 
other work, we applied the same approach to derive improved 
algorithms for the type-II and type-Ill DCT and DST [9]. 

The algorithm for DCT-IV that we present has the same 
recursive structure as some previous DCT-IV algorithms, but 
the subtransforms are recursively rescaled in order to eliminate 
some of the multiplications. This approach reduces the flop 
count for flie DCT-IV from flie previous best of 2N logj N+N 

* Department of Mathematics, Massachusetts Institute of Technology, 
Cambridge MA 01239. 



TV 


previous DCT-IV 


New algorithm 


8 


56 


54 


16 


144 


140 


32 


352 


338 


64 


832 


800 


128 


1920 


1838 


256 


4352 


4164 


512 


9728 


9290 


1024 


21504 


20520 


2048 


47104 


44902 


4096 


102400 


97548 



TABLE I 

Flop counts (real adds + mults) of previous best DCT-IV and 

OUR NEW ALGORITHM 



[10]-[16] to: 

17 
~9 



31 



TVlog2 N+-N+ -(-l)'°s. ^ log, N - -(-1)'°^^ ^ 



27 



9 



27^ 



(1) 

The first savings occur for TV = 8, and are summarized in 
Table. I. In order to derive a DCT-IV algorithm from the new 
FFT algorithm, we simply consider the DCT-IV to be a special 
case of a DFT with real input of a certain symmetry, and 
discard the redundant operations. [This should not be confused 
with algorithms that employ an unmodified FFT combined 
with 0{N) pre/post-processing steps to obtain the DCT.] This 
well-known technique [1], [2], [7]-[9], [17], [18] allows any 
improvements in the DFT to be immediately translated to the 
DCT-IV, is simple to derive, avoids cumbersome re-invention 
of the "same" algorithm for each new trigonometric transform, 
and (starting with a split-radix FFT) matches the best previous 
flop counts for every type of DCT and DST. The connection 
to a DFT of symmetric data can also be viewed as the basic 
reason why DCT flop counts had not improved for so long: as 
we review below, the old DCT flop counts can be derived from 
a split-radix algorithm [15], and the 1968 flop count of split 
radix was only recently improved upon [1], [3]. There have 
been many previously published DCT-IV algorithms derived 
by a variety of techniques, some achieving 2TV log2 TV + TV 
flops [10]-[16] and others obtaining larger or unreported flop 
counts [19]-[22]. Furthermore, exactly the same flop count 
(1) is now obtained for the type-IV discrete sine transform 
(DST-IV), since a DST-IV can be obtained from a DCT-IV via 
flipping the sign of every other input (zero flops, since the sign 
changes can be absorbed by converting subsequent additions 
into subtractions or vice versa) and a permutation of the output 
(zero flops) [12]. Also, in many practical circumstances the 
output can be scaled by an arbitrary factor (since any scaling 
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can be absorbed into a subsequent computation); in this case, 
similar to the well-known savings for a scaled-output size- 
8 DCT-II in JPEG compression [1], [9], [23], [24], we show 
that an additional N multiphcations can be saved for a scaled- 
output (or scaled-input) DCT-IV. 

Indeed, if we only wished to show that the asymptotic flop 
count for DCT-IV could be reduced to ^A'^logj A'^ + 0{N), 
we could simply apply known algorithms to express a DCT-IV 
in terms of a real-input DFT (e.g. by reducing it to the DCT- 
III [10] and thence to a real-input DFT [25]) to immediately 
apply the ^iV log2 N + 0{N) flop count for a real-input DFT 
from our previous paper However, with FFT and OCT 
algorithms, there is great interest in obtaining not only the best 
possible asymptotic constant factor, but also the best possible 
exact count of arithmetic operations. Our result (1) is intended 
as a new upper bound on this (still unknown) minimum exact 
count, and therefore we have done our best with the 0{N) 
terms as well as the asymptotic constant. 

An important transform closely related to the DCT-IV is an 
MDCT, which takes 2N inputs and produces outputs, and 
is designed to be applied to 50%-overlapped blocks of data 
[26], [27]. Such a "lapped" transform reduces artifacts from 
block boundaries and is widely used in audio compression 
[28]. In fact, an MDCT is exactly equivalent to a DCT-IV of 
size N, where the 2A'' inputs have been preprocessed with 
N additions/subtractions [29]-[32]. This means that the flop 
count for an MDCT is at most fliat of a DCT-IV plus TV 
flops. Precisely this technique led to the best previous flop 
count for an MDCT, 2N log^ N + 2N [29]-[32]. (There have 
also been several MDCT algorithms published with larger 
or unreported flop counts [33]-[42].) It also means that our 
improved DCT-IV immediately produces an improved MDCT, 
with a flop count of eq. (1) plus N. Similarly for the inverse 
MDCT (IMDCT), which takes N inputs to 2N outputs and 
is equivalent to a DCT-IV of size N plus TV negations 
(which should not, we argue, be counted in the flops because 
they can be absorbed by converting subsequent additions into 
subtractions). 

In the following sections, we first briefly review the new 
FFT algorithm, previously described in detail [1]. Then, we 
review how a DCT-IV may be expressed as a special case of a 
real DFT, and how the new DCT-IV algorithm may be derived 
by applying the new FFT algorithm and pruning the redundant 
operations. In doing so, we find it necessary to develop a 
algorithm for a DCT-III with scaled output. (Previously, we 
had derived a fast algorithm for a DCT-III based on the 
new FFT, but only for the case of scaled or unsealed input 
[9].) This DCT-III algorithm follows the same approach of 
eliminating redundant operations from our new scaled-output 
FFT (a subtransform of the new FFT) applied to appropriate 
real-synnmetric inputs. We then analyze the flop counts for 
the DCT-III and DCT-IV algorithms. Finally, we show that 
this improved DCT-IV immediately leads to improved DST- 
IV, MDCT, and IMDCT algorithms. We close with some 
concluding remarks about future directions. 



II. Review of the new FFT 

To obtain the new FFT, we used as our starting point a 
variation called the conjugate-pair FFT of the well-known 
split-radix algorithm. Here, we first review the conjugate-pair 
FFT, and then briefly summarize how this was modified to 
reduce the number of flops. 

A. Conjugate-pair FFT 

The discrete Fourier transform of size TV is defined by 

JV-l 

Xk=J2 ^ni^N, (2) 

where ujn — is an iVth primitive root of unity and 

fc = 0, . . . , TV - 1. 

Starting with this equation, the decimation-in-time 
conjugate-pair FFT [1], [43], a variation on the well-known 
split-radix algorithm [4]-[7], splits it into three smaller DFTs: 
one of size TV/2 of the even-indexed inputs, and two of size 
Ar/4: 

N/2-1 N/4-1 

712=0 n4=0 

N/4-1 

+ UJ^^ Y OJ^^^Xin,-!- (3) 

714=0 

[In contrast, the ordinary split-radix FFT uses x^m+z for the 
third sum (a cyclic shift of X/^m-i), with a corresponding 
multiplicative "twiddle" factor of w|^.] This decomposition is 
repeated recursively until base cases of size TV = 1 or TV = 2 
are reached. The number of flops required by this algorithm, 
after certain simphfications (common subexpression ehmina- 
tion and constant folding) and not counting data-independent 
operations like the computation of w^, is 4TVlog2 TV— 6TV-I-8, 
identical to ordinary split radix [1], [44]-[46]. 

B. New FFT 

Based on the conjugate-pair split-radix FFT from section II- 
A, a new FFT algorithm with a reduced number of flops 
can be derived by scaling the subtransforms [1]. We will not 
reproduce the derivation here, but will simply summarize the 
results. In particular, the original conjugate-pair split-radix 
algorithm is split into four mutually recursive algorithms, 
newfftS^ [x) for ^ = 0, 1, 2, 4, each of which has the same 
split-radix structure but computes a DFT scaled by a factor 
of l/s(,N,k (defined below), respectively. These algorithms 
are shown in Algorithm 1, in which the scaling factors are 
combined with the twiddle factors uj% to reduce the total num- 
ber of multiplications. In particular, all of the savings occur 
in newfftSjy {x^, while newfFtS^ (.t) is factorized into a 
special form to minimize the number of extra multiplications it 
requires. Here, although ^ = 0, 1, 2 are presented in a compact 
form by a single subroutine newfftS^ [x), in practice they 
would have to be implemented as three separate subroutines in 
order to exploit the special cases of the multiplicative constants 
SN,k/sm,k, as described in Ref. [1]. For simplicity, we have 
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Algorithm 1 New FFT algorithm of length N (divisible by 
4). The sub-transforms newfftS^ (x) for £ ^ are scaled 
by s^jv.fe, respectively, while ^ = is the final unsealed DFT 
(gp.fc = !)• 

function Xk=o..N-i ^ newflftS^ (a;„): 
{computes DFT / SiN,k, £ = 0,1,2} 
Uk2=o... N/2-1 <- newfftS^/2 {x2„2) 
Zk4.=o... N/4-1 ^ newfrtS]v/4 (.T4„,i+i) 

^fc4=o...JV/4-i ^ newfrtS]v/4 {xAn^-l) 
for fc = to N/A - 1 do 

Xk ^ Uk+ (tN,kZk + t%^i;Zj,^ ■ (sjv,fe/sw,fe) 

Xk+N/2 ^ Uk — (tN,kZk + *JV,fc-^fc) ■ {SN,k/seN,k) 

Xk+N/4 ^ Uk^N/4 

— i (tN,kZk — t*N,k^k) ' iSN,k/stN,k+N/4) 
^k+3N/4 

+ i (tN,kZk — ijv,fc-^fe) ■ iSN,k/sm,k+N/4) 

end for 

function Xj,=o..Ar-i ^ newfftS^ 
{computes DFT / SiN,k} 
Uk2=o... N/2-1 newfftS|r/2 ix2n2) 
Zki=0... N/4-1 ^ newfftS]Y/4 (x^m+i) 

^L=0...N/4-l ^ newfrtS]v/4 {X4ni-l) 

for fc = to - 1 do 



■ (snmI S4N,k) 



X. 



Uk + (tN,kZk + *JV,fe'^iS;) 

C^/c - (tN,kZk + ijv.fe'^fc) 

• {SN,k/ SiM,k+N/2) 
k+N/4 — * (tN,kZk — ^Ar.fc-^fc) 

(sAr,fe/s4Ar,fe+Af/4) 
Uk+N/4 + i (tN,kZk - t*N,k^'k 
{SN,k/ SiN,k+3N/4) 



Xk+3N/4 

end for 



omitted the base cases of the recursion (N = 1 and 2) and 
have not eliminated common subexpressions. 

The key aspect of these algorithms is the scale factor SN,k, 
where the subtransforms compute the DFT scaled by l/sf^^k 
for £ = 1,2,4. This scale factor is defined for N — 2™ by the 
following recurrence, where k4 = k mod ^: 

1 for iV < 4 

sjv=2'",fc = SAr/4,/c4 cos(27rfc4/-^) for A;4 < . (4) 

Sjv/4,fe4 sin(27rfc4/A'') otherwise 

This definition has the properties: sat.o = 1, .''Ar.fe+Ar/4 = sj^^k, 
and sj^^j^/i_i. = sjv,fe. Also, SN,k > and decays rather 
slowly with N: SN,k is n(Ari°g4cos(7r/5)^ asymptotically [1]. 
When these scale factors are combined with the twiddle factors 
u'L, we obtain terms of the form 



_ fe SiV/4,fc 

tN,k — <^N 5 

SN,k 



(5) 



real multiplications than are required to multiply by w^. We 
denote the complex conjuate of t^^k by i^^- resulting 
flop count, for arbittary complex data a;„, is then reduced from 
47Vlog2 TV - 67V + 8 for spUt radix to f A^loga N + 0{N) 
[1]. 



III. Fast DCT-IV from new FFT 

Various forms of discrete cosine transform have been de- 
fined, corresponding to different boundary conditions on the 
ttansform. The type-IV DCT is defined as a real, linear 
ttansformation by the formula: 



CP 



N-l 

E 

n=0 



Xn COS 



N 



(6) 



for N inputs Xn and N outputs . The transform can be 
made orthogonal (unitary) by multiplying with a normalization 
factor ^2/iV, but for our purposes the unnormaUzed form 
is more convenient (and has no effect on the number of 
operations). We will now derive an algorithm, starting from 
the new FFT of the previous section, to compute the DCT- 
IV in terms of a scaled DCT-IE and DST-III. These type-Ill 
transforms are then treated in following section by a similar 
method, and lead to our new flop count for the DCT-IV. 

In particular, we wish to emphasize in this paper that the 
DCT-IV (and, indeed, all types of DCT) can be viewed as 
special cases of the discrete Fourier transform (DFT) with 
real inputs of a certain symmetry. This viewpoint is fruitful 
because it means that any FFT algorithm for the DFT leads 
immediately to a corresponding fast algorithm for the DCT- 
IV simply by discarding the redundant operations, rather 
than rederiving a "new" algorithm from scratch. A similar 
viewpoint has been used to derive fast algorithms for the DCT- 
II [1], [2], [7], [8], [17], [18], as well as in automatic code- 
generation for the DCT-IV [1], [2], and has been observed to 
lead to the nunimum known flop count starting from the best 
known DFT algorithm. Furthermore, because the algorithm 
is equivalent to an FFT algorithm with certain inputs, it 
should have the same floating-point error characteristics as that 
FFT — ^in this case, the underlying FFT algorithm is simply a 
rescaling of split radix [1], and therefore inherits the favorable 
0{^y\ogN) mean error growth and O(logA^) error bounds 
of the Cooley-Tukey algorithm [47]-[49], unHke at least one 
other DCT-IV algorithm [12] that has been observed to display 
O(ViV) error growth [2]. 



which is always a complex number of the form ± 1 ± i tan 

or ±cot2^±iand can therefore be multiplied with two fewer the DFT formula, one can use the identity cos 



A. DCT-IV in terms of DFT 

Recall that the discrete Fourier transform of size N 
is defined by eq. (2). In order to derive C]^ from 
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Fig. 1 . A DCT-IV of length A'' = 4 (open dots xq,xi,X'2,X3) h equivalent 
to a size UN = 32 DFT via interleaving witli zeros (black dots) and extending 
to an odd-even-odd (square dots) periodic (gray dots) sequence. 
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where real-even sequence of length N = 8N, defined 

as follows for < n < AT: 



2^271+1 = XsN-2n-l = -:X„ 

4 



1 



(8) 



(9) 



Furthermore, the even-indexed inputs are zeros: X2n = for 
aU < n < AN. (The factors of 1/4 will disappear in the end 
because they cancel equivalent factors in the sub transforms.) 

This is illustrated by an example, for = 4, in Fig. 1. The 
original four inputs of the DCT-IV are shown as open dots, 
which are interleaved with zeros (black dots) and extended to 
an odd-even-odd (square dots) periodic (gray dots) sequence 
of length 8A'' = 32 for the corresponding DFT. Referring to 
eq. (7), the output of the DCT-IV is given by the first N odd- 
index outputs of the corresponding DFT (with a scale factor 
of 1/4). (The type-IV DCT is distinguished from the other 
types by the fact that it is even around the left boundary of 
the original data while it is odd around the right boundary, 
and the symmetry points fall halfway in between pairs of the 
original data points.) We will refer, below, to this figure in 
order to illustrate what happens when an FFT algorithm is 
applied to this real-symmetirc zero-interleaved data. 

B. DCT-IV from DCT/DST-III 

For a DCT-IV of size N, our strategy is to directly apply the 
new FFT algorithm to the equivalent DFT of size N = 8N, 
and to discard the redundant operations from each stage. As it 



turns out, the sub-transforms after one step of this algorithm 
are actually scaled-output type-Ill DCTs and DSTs. This is 
closely related to a well-known algorithm to express a DCT- 
IV in terms of a half-size DCT-III and DST-III [10]. In this 
section, we derive this reduction to a DCT-III for the new FFT 
algorithm, and then in Sec. IV we derive a new algorithm 
for the scaled-output DCT-HI. The scaled output DST-III 
algorithm can be easily re-expressed in terms of the DCT- 
III, as is presented in Sec. IV-C. Here, we define the DCT-HI 
and DST-III, respectively, by the (unnormaUzed) equations: 
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Starting with the DFT of length N in eq. (2), the new 
split-radix FFT algorithm splits it into three smaller DFTs: 
Uk = dft(i2n2) of size iV/2, as well as the scaled transforms 
^'^ = ^^^(25-4„,+i) and Z'^ = ^dft(2i4n.-i) of 
size N/4: (including a factor of 2 for convenience). These are 
combined via: 



Uk 



fc^fe/2- 



^N/4,k 



4/2. 



(12) 



Here, where i„ comes from the DCT-IV as in Sec. III-A, 
the even-indexed elements in Xn are all zero, so C/fc = 0. 
Furthermore, by the even symmetry of .t„, we have Z'^. = Z^ 
(complex conjugate of Zk). Thus, we have = X2k+i = 
Re(a;g^^S2Ar,2fe-M.^2fe+i)- We will now show that Zk is given 
by combining a DCT-III and a DST-III. 

In order to calculate Zk, we denote for simphcity the inputs 
of this subtransform by Zk = 2x4k+i for < fc < 2N. 
Since Zk is the output of a real-input DFT of size 27V, 
we have Z2N-k = Z*. Thus, for any < < N/2, 

Zi,,, I . However, there is an 



-l-fe)-M — '^2JV-(2fe+l) 



additional redundancy in this transform that we must exploit: 
by inspection of the construction of i„ and by reference to 
Fig. 2, we see that the inputs Zk are actually a real anft'-periodic 
sequence of length N (which becomes periodic when it is 
extended to length 2A^). We must exploit this symmetry in 
order to avoid wasting operations. 

In particular, by using the anti-periodic symmetry of Zk, we 
can write the DFT of length 2N as a single summation of 
length N: 
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Fig. 2. The DCT-IV of size 4 (open dots) is computed, in a split-radix 
conjugate-pair FFT of the JV = 32 extended data x„ from Fig. 1, via the 
DFT Zk of the circled points X4n+i, which is an anti-periodic sequence of 
length 2N = 8. 



^i2k+i) = _i and that 



using the facts that w^^*^^^^ 
Zn+N = —Zn- Then, if we take the real and imaginary parts of 
the third line above, we obtain precisely a DCT-III and a DST- 
III, respectively, with outputs scaled by l/s2iv,2fc+i- However, 
these sub-transforms are actually of size N/2, because the 



. (N-n){2k+l) 

synnmetry lo^n 



-n{2k+l) 
2N 



means that the Zn 



and ZN-n terms merely add or subtract in the input: 
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n I k 
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(13) 



2k+l — 



N-1 

^E^i 

S2N,2k+l ^ 
{-inZN,2 + Y.nLV^^^ 
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TT /, 1 



N/2 



2{Zn + ZN-n) 



~S2N,2k+l 



(14) 



for any < /c < N /2. We can define two new sequences w„ 
(0 < n < iV/2) and u„ (1 < n < N/2) of length N/2 to be 
the inputs of this DCT-HI and DST-III, respectively: 

Wo = 2zo, Wn>(i = 2{Zn - ZN-n) (15) 



Vn<N/2 = -"^{Zn + ZN-n), Vn/2 = -'^Zn/2- (16) 

With this definition of w;„ and w„, we can conclude from 
eqs. (13-14), and the definition of DCT-III and DST-III, that 
the real part of Z2k+i is exactly a scaled-output DCT-III of 
Wn, while the imaginary part of ^2fe+i is exactly a scaled- 
output DST-III of Vn- (The scale factors of ±2 will disappear 
in the end: they combine with the 2 in 2:„ = 2xin+i to cancel 
the 1/4 in the definition of a;„.) 



Algorithm 2 Fast DCT-IV algorithm in terms of scaled-output 
DCT-III and DST-III, derived from Algorithm 1 by discarding 
redundant operations. 



function C^Xq jv 



^ <— newdctlVAT (a;„): 
{computes DCT-IV} 

w;o ^ .To 

vn/2 ^ XN-1 

for fc = 1 to N/2 - 1 do 

Wk ■*— X2k + X2k-1 
Vk ■*— X2k-1 - X2k 

end for 

Wk=o... N/2-1 newdctIIl]v/2 {Wn) 

Vk=Q... N/2-1 ^ newdstIIl]v/2 M 
for fc = to N/2 - 1 do 

Z2k+i ^Wk + iVk 



■•2{N- 



-i-k)+i ^ Wfc - iVk 



end for 

for fc = to TV - 1 do 

^ Re {wf+^S2N,2k+lZ2k+l) 

end for 



Thus, we have shown that the first half of the sequence 
■^2fe+i(0 < k < N) can be found from a scaled-output DCT- 
III and DST-III of length N/2. The second half of the sequence 
Z2k+i can be derived by the relation Z2(jv-i-fc)+i = -^Ife+i 
obtained earlier. Given Z2k+i, the output of the original 
DCT-IV, Cj^, can be obtained by the formula Cl^ = 
Re(a;g^^^S2Ar,2fe+i-^2fe+i)- This algorithm, in which the com- 
putation of Zk has been folded into the computation of Wk and 
Vk, is presented in Algorithm 2. 

In Algorithm 2, newdctlll^ (u) calculates the DCT-HI of 
{wn} scaled by a factor of l/s4fN,2k+i for ^ = 0, 1, 2, 4, and 
will be presented in Sec. IV. Similarly, newdstlll^ {v) cal- 
culates the DST-III of {vn} scaled by a factor of l/s4^iv,2fc+i 
for £ = 0, 1, 2, 4, and will be presented in Sec. IV-C in terms 
of newdctlll^ (u). 

If the scale factors S2jv,2fc+i are removed (set to 1) in 
Algorithm 2, we recover a decomposition of the DCT-IV 
in terms of an ordinary (unsealed) DCT-III and DST-III 
that was first described by Wang [10]. This well-known 
algorithm yields a flop count exactly the same as previous 
results: 2N logj N + N. (Wang obtained a shghtly larger 
count, apparently due to an error in adding his DCT-III and 
DST-III counts.) The introduction of the scaling factors in 
Algorithm 2 reduces the flop count by simphfying some of 
the multiplications in the scaled DCT-III/DST-III compared 
to their unsealed versions, as will be derived in Sec. IV. 
Note that, in Algorithm 2, multiplying by Wg^^S2Ar,2fc+i 
does not require any more operations than multiplying by 
^8N^^' because the constant product lo'^'^^ S2N.2k+i can be 
precomputed. Let Ms{N) denote the number of flops saved 
in newdctllljv (u) compared to the best-known unsealed 
DCT-III. We shall prove in Sec. IV-C that the same number 
of operations, Ms{N), can be saved in newdstllljv (u). 
Thus, the total number of flops required by Alg. 2 will be 
2A/'log2 N + N- 2Ms{N/2). The formula for Ms{N) will 
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be derived in Sec. IV, leading to the final DCT-IV flop count 
formula given by eq. (1). 

IV. DCT-III FROM NEW FFT 

The type-Ill DCT (for a convenient choice of normalization) 
is defined by eq. (10) for N inputs Xn and N outputs C"^. We 
will now follow a process similar to that in the previous section 
for the DCT-IV: we first express Cj}^ in terms of a larger DFT 
of length 4^^, then apply the new FFT algorithm of Sec. II- 
B, and finally discard the redundant operations to yield an 
efficient DCT-III algorithm. The resulting algorithm matches 
the DCT-III flop count of our previous publication [9], which 
improved upon classic algorithms by about 6% asymptotically. 
Unlike our previous DCT-III algorithm, however, the algorithm 
presented here also gives us an efficient scaled-output DCT- 
III, which saves Ms{N) operations over the classic DCT-III 
algorithms. 



length-4 DCT-III inputs 



A. DCT-III in terms of DFT 

In order to derive C"^ 
formula, one can use the identity cos ^ 

1 L ,2i , ,2N-2e , ,2N+2e 



from the DFT 



4 V"'4JV ^4JV 

N-1 

■'k ~ ^ '^'-'^ 



E 

Tl=0 



4JV + ^4]^"^0 to write: 

r 



IT 



n=l 



n(2fc+l) (2Ar-n)(2fe+l) 



(2Ar+ra)(2/c+l) (4iV-ra)(2fe+l) 



4N-1 



''4JV 

n(2fc+l) 
'4N 



(17) 



n=0 



where real-even sequence of length A'' = AN, defined 

as foUows for < n < A'': 



_ 1 

XiN—n — ^"^^ 



X2N-n = X2N+r. 



(18) 



(19) 



with io = xq/2, xtv = 0, X2n ~ —xo/2, and x^m = 0. 
(Notice that the definitions of N and i„ here are different 
from those in Sec. Ill- A.) 

This is illustrated by an example, for TV = 4, in Fig. 3. 
This figure is very similar to Fig. 1: both of them are even 
around the points n = and n = N/2, and are odd around 
the points n = N/4 and n = 37V/4. The difference from the 
DCT-IV is that these points of symmetry/anti-symmetry are 
now data points of the original sequence, and so the data is 
not interleaved with zeros as it was for the DCT-IV. We will 
refer, below, to this figure in order to illustrate what happens 
when an FFT algorithm is apphed to this real-synnmetirc data. 



p 



2xq 
Q 



4 5 6 7 



12 3 



□ 



□ 

a th 

-X, -X, 



p 



2xq 
Q 



X2 
P 



12 13 14 15 16 17 18 19 



equivalent length-16 DFT 

Fig. 3. A DCT-in of length N = A (open dots So , a;i , a;2 , 2:3) is equivalent 
to a DFT of size 4Ar = 16 (scaled by a factor of 1/4), via extending to an 
odd-even-odd (square dots) periodic (gray dots) sequence and doubling the 
xq term. 



B. New (scaled) DCT-III algorithm 

In this subsection, we will apply the new FFT algorithm 
(Alg. 1) to the corresponding DFT for a DCT-III of size 
as obtained in Sec. IV-A. This process is similar to what 
we did in Sec. III-B. We will see that a DCT-III of size 
N can be calculated by three sub transforms: a DCT-III of 
size N/2, a DCT-III of size N/A, and a DST-III of size 
N/A. The resulting algorithm for the DCT-III will have the 
same recursive structure as in Alg. 1: four mutually recursive 
subroutines that compute the DCT-III with output scaled 
by different factors. For use in the DCT-IV algorithm from 
Sec. III-B, we will actually use only three of these subroutines, 
because we will only need a scaled-output DCT-III and not the 
original DCT-III. 

When the new FFT algorithm is applied to the sequence 
Xn of length N = AN defined by eqs. (18-19), we get 
three sub transforms: of the sequences X2n2' 54^4+1 . and 
Xim-i- The DFT of the sequence X2n2 is equivalent to a 
sizs-N/2 DCT-in of the original even-indexed data X2n , as 
can be seen from Fig. 3. The subtransforms of Xim+i and 
Xim-i have exactly the same properties as the corresponding 
subtransforms of the DCT-IV as described in Sec. III-B (except 
that the length of the subtransform x/^m+i here is N instead 
of 2N as in the DCT-IV case). That is, we denote the DFT 
of 2xini+i by Zk, and this combines with the DFT = 
of 2xini-i to yield a Jie{ujf^^ Z2k+i) term in the output as 
before. And, as before, the inputs of £4,14+1 are anti-periodic 
with length N/2. In consequence, we can apply the result 
derived in Sec. III-B to conclude that these two subtransforms 
can be found from a DCT-III of size A;'/4 and a DST-III of 
size A''/4. The corresponding inputs of the DCT-III and DST- 
111, Wn (0 < n < A^/4) and w„ (1 < n < 7V/4), are defined 
as follows [compare to eqs. (15-16)]: 



Wq = 2zo, 'Wn>0 = "^{Zn - 



^n<N/i — —'^{Zn + ZN/2-n)j '^iV/4 — "2^ 



iV/4) 



(20) 



(21) 



where Zn = 2i4„+i for < n < A''/4. (Again, the factors 
of 2 will cancel the factor of 1/4 in the definition of X4n+i.) 
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Therefore, the real part of Z2k+i is the DCT-in of Wn, while 
the imaginary part is the DST-III of u„. In summary, a DCT- 
III of size N can be calculated by a DCT-IH of size N/2, a 
DCT-III of size N/4, and a DST-III of size N/4. Without the 
scaling factors s, this is equivalent to a decomposition derived 
by Wang of a DCT-III of size N into a DCT-HI and a DCT-IV 
of size N/2 [19], in which the DCT-IV is then decomposed 
into a DCT-III and a DST-III of size N/4 [10]. 

The above discussion is independent of the scaling factor 
applied to the output of the transform. So, for the various scale 
factors in the different subroutines of Alg. 1, we simply scale 
the DCT-III and DST-III subtransforms in the same way to 
obtain similar savings in the multiplications (as quantified in 
the next section). This results in the new DCT-III algorithm 
presented in Algorithm 3. (The base cases, for AT = 1 and 
N = 2, are omitted for simplicity.) 

Just as in Sec. II-B, although £ = 0, 1, 2 are presented here 
in a compact form by a single subroutine newdctlll^ (x), 
in practice they would have to be implemented as separate 
subroutines in order to exploit the special cases of the mul- 
tiplicative constants S4Ar,2fc+i/s4^JV,2fc+i, sinnilar to our FFT 
[1]. 

C. DST-III from DCT-III 

The DST-III and DCT-III are closely related. In particular, a 
DST-III can be obtained from a DCT-III, with the same number 
of flops, by reversing the inputs and multiplying every other 
output by -1 [9], [12], [50], [51]: 



N 



Sk =^^n sin 

N-l r , ■ 

Ti=o L V / . 



(22) 



Similarly, one can derive a DST-III in terms of a DCT- 
III algorithm for any scaling factor. Here, we present the 
algorithm newdstlll^ (w) in terms of newdctlll^ (u) for 
£ = 0, 1, 2, 4. As we can see from Alg. 4, flie new DST-III 
algorithm (with scaled output) has exactly the same operation 
count as the corresponding DCT-III algorithm (unary negations 
are not counted in the flops, because they can be absorbed 
by converting additions into subtractions or vice versa in the 
preceding DCT computation). This proves our previous asser- 
tion that the numbers of operations saved in newdctlllj^ (u) 
and newdstlll]v ip), compared to the known unsealed algo- 
rithms, are both the same number Ms[N). 

V. Flop counts for DCT-lll/lV 

First, we will show that Alg. 3 gives the best previous flop 
count T{N) = 2A/'log2 A/' - iV -M for tiie DCT-III if the 
scaling factor s is set to 1 . Inspection of Algorithm 2 yields a 
flop count 4N-2-^2T{N/2) for the DCT-IV, and substituting 
T{N) gives the previous best flop count of 2A''log2 N -\- N 
for the DCT-IV. Then, we will analyze how many operations 
are saved when the scaling factors are included. 

If s = 1, we can see from Alg. 3 that a DCT-III of size N 
is decomposed into a DCT-III of size A^/2, a DCT-IH of size 



Algorithm 3 New DCT-III algorithm of length N. The 
sub-transforms newdctlll^ (x) for £ ^ are scaled by 
S4eN,2k+i, respectively, while ^ = is the final unsealed DFT 
{So,2k+l = !)• 

function C™o,jv_i ^ newdctlll^ (.x„): 
{computes DCT-III / S4W,2fc+i for £ = 0, 1, 2} 

Wo <- Xl 

for fc = 1 to N/4 - 1 do 

Wk <- Xik+l + Xik-l 
Vk ^ X4k-1 - X4k+1 

end for 

Uk^=0...N/2-l <- newdctIII^/2 ix2n2) 

Wk^=o...N/4-i newdctIIljv/4 (w„J 



Vk,= 



k4=Q...N/i-l 



newdstlll 



N/4 



for fc = to N/4 - 1 do 

^2fc+i ^ Wfc + iVk 

ZN-2k-i ■*— Wfe - iVk 
end for 

for fc = to N/2 - 1 do 

^fe" ^ Uk + Rc {t4N,2k+lZ2k+l) s^'"!^) 



III 

N-k-1 



Uk — Re (i4JV,2fe+l-^2fe+l) 



end for 

function C™q ^ newdctlllj^ (a;„): 
{computes DCT-III / Si6Ar,2fe+i} 

Wo ^ xi 

Vn/A ^ Xn-1 

for fc = 1 to N/4 - 1 do 

Wk •*— X4k+l + X4k-l 
Vk •*— X4k-1 — X4k+1 

end for 



54^JV,2fc+l 



u, 



k2=0... N/2-1 



newdctlll 



N/2 



{X2n2) 

^fc4=o...JV/4-i ^ newdctIIl]v/4 (wn^) 
newdstIIl]Y/4 {v„J 



ki=0...N/4~l 



for = to N/4 - 1 do 
^2fe+i <- Wfe -I- iVk 
ZN-2k-i <— Wife - iVk 
end for 

for fc = to N/2 - 1 do 

Cfe" ^ [Uk + Re {tAN,2k+lZ2k+l)\ steN^k+i 



'^N-k-l 

end for 



\Uk — Re {t4N,2k+lZ2k+l) 



S4iV,2fc+l 



Algorithm 4 scaled-output DST-111 algorithm of size TV, based 
on the scaled-output DCT-III algorithm which is presented in 
Sec. IV, with the same operation count. 



function S'J^lIo .iv-i 



newdstlll^ [xn)'- 



{computes DST-III /.S4^Ar,2/s+i} 
for fc = to A^ - 1 do 

Wk ■*— XN-k 

end for 

C™o N-i ^ newdctlll^ {wr 
for k '= to A - 1 do 

end for 



8 



which requires the same number of flops for I < 2 and one 
more multiplication for £ > 2: 

M(l) = Ms(l) = 0, 
Msi{l) = Msi{l) = -I. (26) 

When iV = 2, we obtain scale factors suN,2k+i = ss,e,2k+i- 
For i <A, S8^,i = S8^,3. while Su,\ ^ Sg^.s for £ = 4. The 
unsealed algorithm calculates the output i/o.i = xa ± yjl/2xi, 
where 3 flops are required. For t = 0,1,2, the algorithms 
newdctlll^ (a;) calculates yo,i = [xo ± \/l/2a;i)/s8£,i. 
which requires 3 flops for £ = (where s = 1), 3 flops for 
£ = 1 (where s = 1/a/2 and cancels one of the constants), 
and 4 flops for £ — 2. For £ = 4, newdctlll^ (x) calculates 
2/0 = {xa + ^a;i)/s32,i and yi = {xq - -j^Xi) / 832,3, where 
5 flops are required. Thus, we have 



N/4: and a DST-III of size N/A, and all four of our recursive 
subroutines are identical (they only differed by s factors). In 
addition, 2(7V/4 — 1) flops are required to obtain the sequences 
Wk and Vk, and 5N/2 flops are required to obtain the output of 
the DCT-III from the outputs of the subtransforms. Therefore, 
we obtain the recurrence relation for T{Ny. 

T{N) = T{N/2) + 2T{N/4) + 3N-2. (23) 

The initial conditions for T{N) can be determined easily. If 
N = 1, yo = Xq. Therefore, r(l) = 0. If = 2, = 
xo + X\l\f2 and y\ = xq — xi/^/2. Therefore, T(2) = 3. 
Solving eq. (23) with these initial conditions, we inomediately 
obtain the following result: 

T{N) = 2N log2 N -N+l. (24) 

This flop count is the same as the previous best flop count for 
DCT-III algorithms [7], [10]-[13], [17], [21], [51]-[56] prior 
to our work [1], [9]. 

Since our DCT-III algorithm without scaling factors (i.e. 
with s = 1) obtains the same number of flops as the best 
previous DCT-III algorithms, it only remains to determine 
how many operations are saved by including the scale fac- 
tors. We now analyze this count of saved flops by solving 
the appropriate recurrence relations. In particular, let M{N), 
Ms{N), Ms2{N) and Ms4{N) (where AT is a power of 2) 
denote the number of operation saved (or spent, if negative) 
in newdctlll^ (x) for = 0, 1, 2, 4, respectively, compared 
to the corresponding unsealed DCT-IH algorithm. 

First, let us derive the recurrence relations for M{N) and 
so on, similar to the analysis of Alg. 1 [1]. The number of 
flops saved in newdctlll^ (a;) is the sum of the flops saved 
in the subtransforms and the number of flops saved in the loop 
to calculate the final results C]}^. In newdctlll^ (x), 5 • ^ 
flops are required in the loop, as in the old unsealed algorithm. 
In newdctlll^Y (.t), only 4 • ^ flops are needed since either 
the real part or the imaginary part of i4Ar,2fe+iis 1. Thus, N/2 
flops in the loop are saved for ^ = 1. In newdctlll^ (a;), 
5 • Y flops are again required in the loop. (In contrast, for 
Alg. 1 the £ = 2 case required two more multiplications than 
the £ = case because of the A; = term [1], which is not 
present here because 2fc + 1 7^ 0.) In newdctlll^ (.x), 6 • 
flops are required in the loop since si6Ar,2fe+i 7^ si6Af,2fe+i+2Af 
and hence we must multiply the two scale factors separately. 
This means that we spend N/2 extra multiplications in the 
£ = 4 case, which is counted as a negative term in Ms a. 
Thus, we have the foUowing relations: 

M{N) = M{N/2) + 2Ms{N/A) 
Ms{N) = Ms2{N/2) + 2Ms{N/4) + N/2 

Ms2{N) = Msa{N/2) + 2Ms{N/4) 

Msi{N) = Ms2{N/2) + 2Ms{N/4) - N/2. (25) 

We next determine the number of flops saved (if any) for 
the base cases, TV = 1 and iV = 2. When = 1, the 
unsealed algorithm computes the output yo = xq, while 
the algorithms newdctlll^ (a;) calculate yo = (l/s«,i)a;o 



M(2) = Ms{2) = 0, 
Ms2{2) = -1, 

Ms4{2) = -2. (27) 

With these base cases, one can solve the recurrences (25) by 
standard generating-function methods [57] to obtain: 

MsiN) = iiVlog2 7V-i-Ar+l(-l)i°s=^log2iV+i-(-l)i°s= 

^ ^ (28) 

Recall from Sec. IH-B that 2Ms{N/2.) flops can be saved 
in the new DCT-IV algorithm compared to the best previous 
algorithms, resulting in a total flop count of 2 log2 N + N — 
2Ms{N/2). This gives the DCT-IV flop count in eq. (1) for 
Algorithm 2. This expression for the flop count of the new 
DCT-IV algorithm matches the results that were derived by 
automatic code generation for small A^ [1], as expected. 

In general, as was discussed in our other work on the DCT- 
II/III [9], the number of multiplications may change depending 
upon the normaUzation chosen. For the DCT-IV, a common 
normalization choice is to multiply by ^J2/N, which makes 
the transform unitary, but this does not change the number 
of flops because the normaUzation can be absorbed into the 
^m^^ S2N ,2k+i factor (which is 7^ 1 for all k). On the other 
hand, if one is able to scale every output of the DCT-IV 
individually, for example if the scale factor can be absorbed 
into a subsequent computational step, then the best choice in 
the present algorithm seems to be to scale by l/s8Ar.2fe+i- 
This choice of scale factor will transform oj^^ S2n ,2k+i into 
t9,N,2k+i in Algorithm 2, which can be multiplied in one fewer 
multiplication, saving N multiphcations overall. Similarly, one 
can save N multiphcations for a sca\sA-input, unscaled-output 
DCT-IV, since the scaled-output DCT-IV can be transformed 
into a scaled-input DCT-IV by network transposition [58] 
without changing the number of flops [9]. 



VI. DST-IV FROM DCT-IV 
The (unnormalized) DST-IV is defined as: 

N-l 



c-IV 



E 

ra=0 



Xn Sin 



TV 



n + 



1 



(29) 



for fc = 0, . . . , A^ — 1. Although we could derive fast algorithms 
for S]^ directly by treating it as a DFT of length with odd 



9 



symmetry, interleaved with zeros, and discarding redundant 
operations similar to above, it turns out there is a simpler 
technique. The DST-IV is exactly equivalent to a DCT-IV 
in which the outputs are reversed and every other input is 
multiplied by —1 (or vice versa) [12]: 



'AT-l-fc 



JV-1 
n=0 



(-1)' 



Xn COS 



N 



(30) 

for fc = 0, . . . , TV - 1. It therefore follows that a DST-IV can 

be computed with the same number of flops as a DCT-IV of 
the same size, assuming that multiplications by —1 are free — 
the reason for this is that sign flips can be absorbed at no cost 
by converting additions into subtractions or vice versa in the 
subsequent algorithmic steps. Therefore, our new flop count 
(1) immediately applies to the DST-IV. 

VII. MDCT FROM DCT-IV 

In this section, we will present a new modified DCT 
(MDCT) algorithm in terms of our new DCT-IV algorithm 
with an improved flop count compared to the best previously 
published counts. The key fact is that the best previous flop 
count for an MDCT of 2N = 2™ inputs and N outputs 
was obtained by reducing the problem to a DCT-IV plus N 
extra additions |29|-|32|. Therefore, our improved DCT-IV 
algorithm immediately gives an improved MDCT. Similarly 
for the inverse MDCT, except that in that case no extra 
additions are required. 

An MDCT of lengfli 'W" has 2N inputs a;„ (0 < n < 2N) 
and N outputs {Q<k <N) defined by die formula (not 
including normalization factors): 



2N-1 

E 

n=0 



Xn cos 



N 



N 

y 



(31) 



This is "inverted" by the inverse MDCT (IMDCT), which takes 
N inputs and gives 2N outputs y„, defined by (again not 
including normalization): 



N-l 

E 

fe=0 



Cfcos 



1 N 



k + 



(32) 



These transforms are designed to operate on consecutive 
50%-overlapping blocks of data, and when the IMDCTs of 
subsequent blocks are added in their overlapping halves the 
resulting "time-domain aliasing cancellation" (TDAC) yields 
the original data a;„ [26], [27]. The MDCT is widely used 
in audio compression, where the overlapping reduces artifacts 
from the block boundaries [28]. 

The MDCT and IMDCT can be trivially re-expressed in 
terms of a DCT-IV of size N [29]-[32]. Let us define 



cos 



TT 

TV 



which has the symmetry S2jv+n = "^iN-i-n = — ^n- In 



terms of S„, the MDCT becomes 

2JV-1 



n=0 



n=0 

2 ^ 

^ ^§+n {Xn-XN-l-n) 



n=0 



-E-"(^ 



X3N 



+ X3N 



ra=0 



+n) 



N-l 



n=N/2 



N/2 



X3N 



-l-n) 



N/2-1 

E - 

n=0 



XnN _ 



+ X3N 



+n) ■ 



But the final summation is simply a DCT-IV of the sequence 
Xn defined by a;„ = —{x3n_^_^ + xsN+n) for < n < 



N 



and Xr, 



'Cn-N/2 — XiN_-^_ 



for 



N_ 
2 



< n < N. 



Therefore, given any algorithm for a DCT-IV, the MDCT can 

be computed with at most iV extra additions. (Here, we are 
not counting multiplication by —1, because negations can be 
absorbed by converting additions into subtractions and vice 
versa in subsequent computational steps.) Since the previous 
best flop count for the DCT-IV was 2iV log2 N + N flops, this 
led to a flop count of 2TVlog2 N + 2N for the MDCT [29]- 
[32]. Instead, we can use our new algorithm for the DCT- 
IV to immediately reduce this flop count for the MDCT to 
eq. (l)+iV: 



17,,, 58,, 
— 7Vlog,iV+ — A^- 
9 ^^ 27 



|(-l)'°s^^log2iV-l(-l)i°s^^. 
' (33) 



The IMDCT requires almost no manipulation: it is already 
in the form of a DCT-IV, except that we are evaluating the 
DCT-IV beyond the "end" of the inputs. Since a DCT-IV 
corresponds to an ti- symmetric data as in Fig. 1, this just means 
fliat we compute tiie DCT-IV and obtain flie IMDCT by storing 
the outputs and their mirror image (multiplied by —1), shifted 
by N/2. So, the flop count for the IMDCT is exactly the same 
as the flop count for the DCT-IV, not counting negations. Any 
overall negation of an output can be eliminated by converting 
a preceding addition to a subtraction (or changing the sign of 
a preceding constant), but some of the (redundant) IMDCT 
outputs are needed with both signs, which seems to imply 
that an explicit negation is required. The latter negations are 
easily eliminated in practice, however: an IMDCT is always 
followed in practice by adding overlapping IMDCT blocks to 
achieve TDAC, so the negations simply mean that some of 
these additions are converted into subtractions. 

VIII. Concluding remarks 

We have derived new algorithms for the DCT-IV, DST- 
IV, MDCT, and IMDCT fliat reduce the flops for a size 
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AT = 2" from 2iVlog2 N + 0{N) to ^Nlog^ N + 0{N), 
representing the first improvement in their flop counts for 
many years and stemming from similar developments for the 
DFT [1], [3]. We do not claim that these flop counts are 
the best possible, although we are not currently aware of 
any way to obtain further reductions in either the leading 
coefficient or in our 0{N) terms. It is possible that further 
gains could be made by extending our recursive rescaling 
technique to greater generality, for example. However, we 
believe that such investigations will be most easily carried out 
in the context of the DFT, since FFT algorithms (in terms of 
complex exponentials) are typically much easier to work with 
than fast DCT algorithms (in terms of real trigonometry), and 
any improved FFT algorithm wiU immediately lead to similar 
gains for DCTs (and vice versa: any improved DCT leads to 
an improved FFT) [8]. Another open question is whether these 
new algorithms will lead to practical gains in performance on 
real computers. This is a complicated and somewhat ill-defined 
question, because performance characteristics vary between 
machines and depend strongly on many factors besides flop 
counts — any simple algorithms Uke the ones presented here 
require extensive restructuring to make them efficient on 
real computers, just as classic spht-radix does not perform 
well without modification [2]. On the other hand, for small 
fixed N where straight-line (unrolled) hard-coded kernels are 
often employed in audio and image processing (where the 
block size is commonly fixed), we have demonstrated that 
automatic code-generation techniques (given only the new 
FFT) can produce efficient DCT-IV (and MDCT, etc.) kernels 
attaining the new operation counts, and that the performance 
is sometimes improved at least shghtly [1]. 
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